THCHS-30 : A Free Chinese Speech Corpus

نویسندگان

Dong Wang

Xuewei Zhang

چکیده

Speech data is crucially important for speech recognition research. There are quite some speech databases that can be purchased at prices that are reasonable for most research institutes. However, for young people who just start research activities or those who just gain initial interest in this direction, the cost for data is still an annoying barrier. We support the ‘free data’ movement in speech recognition: research institutes (particularly supported by public funds) publish their data freely so that new researchers can obtain sufficient data to kick off their career. In this paper, we follow this trend and release a free Chinese speech database THCHS-30 that can be used to build a full-fledged Chinese speech recognition system. We report the baseline system established with this database, including the performance under highly noisy conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Super Phonetic System and Multi-dialect Chinese Speech Corpus for Speech Recognition

In this paper, we describe the work on Chinese multi-dialect speech processing. Based on the phonetic analysis of ten Chinese dialects, we have created a Chinese super phonetic system for the Chinese speech recognition. To exam this phonetic system and develop Chinese dialect speech technology, we are building a multi-dialect speech corpus, which includes 10 dialect areas and 2000 speakers.

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

متن کامل

PCFG Parsing for Restricted Classical Chinese Texts

The Probabilistic Context-Free Grammar (PCFG) model is widely used for parsing natural languages, including Modern Chinese. But for Classical Chinese, the computer processing is just commencing. Our previous study on the part-of-speech (POS) tagging of Classical Chinese is a pioneering work in this area. Now in this paper, we move on to the PCFG parsing of Classical Chinese texts. We continue t...

متن کامل

Construction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition

In this paper we describe the development of an annotated Chinese conversational textual corpus for speech recognition in a speech-to-speech translation system in the travel domain. A total of 515,000 manually checked utterances were constructed, which provided a 3.5 million word Chinese corpus with word segmentation and part-of-speech tagging. The annotation is conducted with careful manual ch...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1512.01882 شماره

صفحات -

تاریخ انتشار 2015

THCHS-30 : A Free Chinese Speech Corpus

نویسندگان

چکیده

منابع مشابه

A Super Phonetic System and Multi-dialect Chinese Speech Corpus for Speech Recognition

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

PCFG Parsing for Restricted Classical Chinese Texts

Construction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition

عنوان ژورنال:

اشتراک گذاری